Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.

Boston is a metropolitan city and home to more than 11,000 food establishments Leung, 2018. Bostonians have plethora of dining choices that are available to them. This is an amazing perks for Bostonians who love to eat out and try out different food scenes; however, this also means that customers are faced with many different options. Choosing one will not be easy. Food and restaurant safety–measured by proxy of passing health inspections–is one of the most important factors that customers should think about when dining out. However, information on health inspection violations might not be readily accessible to customers. For example, Boston CBS wrote that “almost every Boston restaurant gets an”A" grade, even those with flarant health code records“, and that these grades do not show the full history of restaurants’ violations. Restaurants that severely violated the inspection codes could regain their”A" grade after passing their next inspections; and some restaurants had shown patterns of repeatedly cleaning up their acts after failing inspections (Boston CBS). Knowing the recent statistics that there are more than 1,600 health inspection violations at Boston restaurants in 2017 Leung, 2018, this piece of information is valuable for customers. To tackle this, we are interested in looking at food establishment inspections data provided by the City of Boston to analyze the trends in health inspections violations amongs restaurants inspected in Greater Boston. Specifically, we are interested in looking at the patterns of violations across the years, food establishment types, and geographical area. #Initial Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?

=========================================================================================================== #Data Cleaning Data downloaded at 11/8/2018 at 4:44pm. Data saved as csv file. There are 547700 observations with 26 variables.

Current info: - businessName (0 missing) - DBAName (suggest drop, reason: missing values) - LegalOwner (suggest drop, reason: missing values) - NameLast - NameFirst (suggest drop, reason: some are informative–e.g. variations on “inc”) - LICENSENO - ISSDTTM - EXPDTTM - LICSTATUS (“Active”, “Deleted”, “Inactive”) –> combine “Deleted” and “Inactive”? - LICENSECAT (“FS” “FT” “MFW” “RF”) - DESCRIPT (“Eating & Drinking” “Eating & Drinking w/ Take Out” “Mobile Food Walk On” “Retail Food”) - RESULT [1] “DATAERR” “Fail” “Failed” “HE_Closure” “HE_Fail” “HE_FailExt” [7] “HE_FAILNOR” “HE_Filed” “HE_Hearing” “HE_Hold” “HE_Misc” “HE_NotReq” [13] “HE_OutBus” “HE_Pass” “HE_TSOP” “Pass” “PassViol”

RESULT n 1 DATAERR 41 2 Fail 238 3 Failed 6 4 HE_Closure 522 5 HE_Fail 231949 6 HE_FailExt 41291 7 HE_FAILNOR 115 8 HE_Filed 64700 9 HE_Hearing 18721 10 HE_Hold 17 11 HE_Misc 94 12 HE_NotReq 9836 13 HE_OutBus 2025 14 HE_Pass 170761 15 HE_TSOP 6226 16 Pass 964 17 PassViol 2

After looking at the data, I would suggest trimming them.. 35,751 have different years between RESULTDTTM vs VIOLDTTM. I would suggest going with VIOLDTTM and ViolDesc, since we (or I) have no clue what RESULT means. 31,240 observation (5.7%) is missing the violation years. Options: 1) impute from the RESULTSDTTM, 2) complete case analysis. For this case, we stick with complete case, excluding any observations without date (thus years) of violation, description of violation (1 obs), and violation status (5744 observations). Total observations are 510,715.

keeps <- c("businessName","NameLast", "LICENSENO","LICSTATUS", "DESCRIPT","ViolDesc","VIOLDTTM", "ViolStatus", "Address","CITY", "ZIP", "ViolLevel")

library(lubridate)
violation<-test[keeps]
violation <- violation %>% mutate(year = year(VIOLDTTM))

ccvioly <- violation[!is.na(violation$year), ]
ccvioly <- ccvioly[!is.na(ccvioly$ViolDesc),]
ccvioly <- ccvioly[!is.na(ccvioly$ViolStatus),]
ccvioly <- ccvioly[!is.na(ccvioly$ZIP),]
ccvioly<-ccvioly[which(ccvioly$ViolDesc != ""),]
ccvioly<-ccvioly[which(ccvioly$ZIP!= "0"),]
summary(ccvioly)
##          businessName                             NameLast     
##  Subway        :  2243   CVS PHARMACY INC.            :  1952  
##  Dunkin Donuts :  1942   c/o Lyons Group              :  1794  
##  Mcdonalds     :  1641   STARBUCKS CORPORATION        :  1741  
##  Dunkin' Donuts:  1524   Shaw's Supermarket Inc.      :  1519  
##  Au Bon Pain   :  1485   WHOLE FOODS MARKET GROUP INC.:  1479  
##  McDonald's    :  1373   Stop & Shop Supermarket Co.  :  1375  
##  (Other)       :499807   (Other)                      :500155  
##    LICENSENO         LICSTATUS                               DESCRIPT     
##  Min.   :  4032   Active  :338921   Eating & Drinking            :247160  
##  1st Qu.: 21261   Deleted :    27   Eating & Drinking w/ Take Out:197305  
##  Median : 24711   Inactive:171067   Mobile Food Walk On          :  4095  
##  Mean   : 46539                     Retail Food                  : 61455  
##  3rd Qu.: 37246                                                           
##  Max.   :379898                                                           
##                                                                           
##                                    ViolDesc     
##  Non-Food Contact Surfaces Clean       : 40895  
##  Improper Maintenance of Walls/Ceilings: 37396  
##  Non-Food Contact Surfaces             : 33021  
##  Improper Maintenance of Floors        : 31162  
##  Food Protection                       : 28422  
##  Food Contact Surfaces Clean           : 24762  
##  (Other)                               :314357  
##     VIOLDTTM                   ViolStatus   
##  Min.   :2006-12-13 13:31:00   Fail:283352  
##  1st Qu.:2010-04-16 12:24:44   Pass:226663  
##  Median :2013-06-05 10:57:31                
##  Mean   :2013-04-24 18:18:37                
##  3rd Qu.:2016-03-15 13:05:36                
##  Max.   :2018-11-08 14:15:05                
##                                             
##                              Address                CITY       
##  1   Citywide ST                 :  3766   Boston     :189836  
##  417   Washington ST             :  2398   Dorchester : 51222  
##  800   BOYLSTON ST               :  2395   BOSTON     : 48403  
##  200   Faneuil Hall Market PL    :  2326   Roxbury    : 37799  
##  100-199   Faneuil Hall Market PL:  2282   East Boston: 26316  
##  300   FANEUIL HALL MARKET PL    :  2057   (Other)    :156419  
##  (Other)                         :494791   NA's       :    20  
##       ZIP       ViolLevel          year     
##  Min.   :2050   *   :386457   Min.   :2006  
##  1st Qu.:2115   **  : 32442   1st Qu.:2010  
##  Median :2121   *** : 91116   Median :2013  
##  Mean   :2127   1919:     0   Mean   :2013  
##  3rd Qu.:2129                 3rd Qu.:2016  
##  Max.   :2467                 Max.   :2018  
## 

We could try to categorize these violations to violation categories: (http://www.foodprotection.org/files/food-protection-trends/MarApr-14-Burke-Manes.pdf) 1) Floors, Walls, and Ceilings “Fixture’s not properly shielded” “Floors Designed Constructed Installed” “Improper Maintenance of Floors” “Improper Maintenance of Walls/Ceilings” “Walls/Ceilings Designed Constructed Installed”

  1. Food Equipment and Utensils “Clean Equipment & Utensils Storage” “Dishwashng Facilities” “Equipment Thermometers” “Food Contact Surfaces Clean” “Food Contact Surfaces Design” “Food Thermometers Provided” “Food Utensil Storage” “Improper Storage of Re-usable Utensils” “Inadequate Facilities/Cooling Methods” “Installed and Maintained” “Non-Food Contact Surfaces” “Non-Food Contact Surfaces Clean” “Pre-Flushed Scrapped Soaked” “Re-use of Single Service Articles” “Recieving/Sound Condition/Proper Temperatures” “Single Service Articles Stored Dispensed” “Test Kit Provided” “Three Compartment Sink” “Wash Rinse Water Clean Proper Temperature.” “Wiping Cloths Clean Sanitize”

  2. Food Protection “Cold Holding” “Hot Holding” “Conformance w/ Approved Procedures” “Cooking Temperatures” “Cooling”
    “Food products protected against contamination(Properly covered/secured for overnight storage).” “Food Protection” “Food Restrictions & Preparation” “Handling of Food & Ice” “PHF’s Properly Thawed” “Reduced Oxygen Packaging” “Reheating” “Reservice of PHF or Unwrapped Foods” “Separation Segregation Cross Contamination” “Washing fruits and veg’s.”

  3. Toilet and hand-washing facilities “Adequate Handwashing/Where/When/How” “Hand Cleaner Drying Tissue Signage” “Location Accessible”
    “Low Sanitizer Alarm” “Number Convenient” “Prevention of Contamination from Hands” “Separation/Sanitizer Criteria” “Toilet Enclosed Clean”

  4. Food “Approved Food or Color Additives” “Approved Source” “Food Container Labels” “Labeled Common Name” “Labeling of Ingredients” “Shellstock ID” “Spoilage Unsafe Food” “Tags & Records”

  5. Plumbing “Cross Connection Back Siphonage Backflow” “Sewage and Waste Water”

  6. Lighting “Inadequate Lighting”

  7. Insect, Rodent, Animal Control “Adequate Number Frequency Vermin Proof” “Insects Rodents Animals” “Parasite Destruction” “Pesticide Usage”

  8. Garbage and Refuse Disposal “Improper Cleaning of Receptacles” “Outside Storage Improperly Maintained”

  9. Personnel “Clean Cloths Hair Restraint” “Good Hygienic Practices” “Non-Compliance w/Employee Health Policy” “Person in charge Assigned” “Personnel w/ Infections Restricted/Excluded” “PIC Knowledge” PIC Performing Duties" “Safe Food Handeling Instructions” “Times as a Public Health Control” “Tobacco”

  10. Water “Hot and Cold Water” “Unsafe Water”

  11. Dressing Rooms “Dressing Rooms Clean/Lockers Provided”

  12. Other “Anti-Choking” “Consumer Advisories”
    “Living/Sleeping Quaters/Laundry” “Medicines FirstAid Storage” “Mop Sink not Provided” “Premises Maintained” “Proper storage of clean linen” “Public Nuisance” “Rooms and Equipment Vented” “Soiled Linen Storage” “Toxic Items: Original Container”

levels(ccvioly$ViolDesc)
##  [1] "Adequate Handwashing/Where/When/How"                                                           
##  [2] "Adequate Number  Frequency Vermin Proof"                                                       
##  [3] "Anti-Choking"                                                                                  
##  [4] "Approved Food  or Color Additives"                                                             
##  [5] "Approved Source"                                                                               
##  [6] "Clean Cloths  Hair Restraint"                                                                  
##  [7] "Clean Equipment & Utensils Storage"                                                            
##  [8] "Cold Holding"                                                                                  
##  [9] "Conformance w/ Approved Procedures"                                                            
## [10] "Consumer Advisories"                                                                           
## [11] "Cooking Temperatures"                                                                          
## [12] "Cooling"                                                                                       
## [13] "Cross Connection Back Siphonage  Backflow"                                                     
## [14] "Dishwashng Facilities"                                                                         
## [15] "Dressing Rooms Clean/Lockers Provided"                                                         
## [16] "Equipment Thermometers"                                                                        
## [17] "Fixture's not properly shielded"                                                               
## [18] "Floors Designed Constructed Installed"                                                         
## [19] "Food Contact Surfaces Clean"                                                                   
## [20] "Food Contact Surfaces Design"                                                                  
## [21] "Food Container Labels"                                                                         
## [22] "Food products protected against contamination(Properly covered/secured for overnight storage)."
## [23] "Food Protection"                                                                               
## [24] "Food Restrictions & Preparation"                                                               
## [25] "Food Thermometers Provided"                                                                    
## [26] "Food Utensil Storage"                                                                          
## [27] "Good Hygienic Practices"                                                                       
## [28] "Hand Cleaner  Drying  Tissue Signage"                                                          
## [29] "Handling of Food & Ice"                                                                        
## [30] "Hot and Cold Water"                                                                            
## [31] "Hot Holding"                                                                                   
## [32] "Improper Cleaning of Receptacles"                                                              
## [33] "Improper Maintenance of Floors"                                                                
## [34] "Improper Maintenance of Walls/Ceilings"                                                        
## [35] "Improper Storage of Re-usable Utensils"                                                        
## [36] "Inadequate Facilities/Cooling Methods"                                                         
## [37] "Inadequate Lighting"                                                                           
## [38] "Insects  Rodents  Animals"                                                                     
## [39] "Installed and Maintained"                                                                      
## [40] "Labeled  Common Name"                                                                          
## [41] "Labeling of Ingredients"                                                                       
## [42] "Living/Sleeping Quaters/Laundry"                                                               
## [43] "Location  Accessible"                                                                          
## [44] "Low Sanitizer Alarm"                                                                           
## [45] "Medicines  FirstAid Storage"                                                                   
## [46] "Mop Sink not Provided"                                                                         
## [47] "Non-Compliance w/Employee Health Policy"                                                       
## [48] "Non-Food Contact Surfaces"                                                                     
## [49] "Non-Food Contact Surfaces Clean"                                                               
## [50] "Number  Convenient"                                                                            
## [51] "Outside Storage Improperly Maintained"                                                         
## [52] "Parasite Destruction"                                                                          
## [53] "Person in charge Assigned"                                                                     
## [54] "Personnel w/ Infections Restricted/Excluded"                                                   
## [55] "Pesticide Usage"                                                                               
## [56] "PHF's Properly Thawed"                                                                         
## [57] "PIC Knowledge"                                                                                 
## [58] "PIC Performing Duties"                                                                         
## [59] "Pre-Flushed  Scrapped  Soaked"                                                                 
## [60] "Premises Maintained"                                                                           
## [61] "Prevention of Contamination from Hands"                                                        
## [62] "Proper storage of clean linen"                                                                 
## [63] "Public Nuisance"                                                                               
## [64] "Re-use of Single Service Articles"                                                             
## [65] "Recieving/Sound Condition/Proper Temperatures"                                                 
## [66] "Reduced Oxygen Packaging"                                                                      
## [67] "Reheating"                                                                                     
## [68] "Reservice of PHF or Unwrapped Foods"                                                           
## [69] "Rooms and Equipment Vented"                                                                    
## [70] "Safe Food Handeling Instructions"                                                              
## [71] "Separation  Segregation Cross Contamination"                                                   
## [72] "Separation/Sanitizer Criteria"                                                                 
## [73] "Sewage and Waste Water"                                                                        
## [74] "Shellstock ID"                                                                                 
## [75] "Single Service Articles Stored Dispensed"                                                      
## [76] "Soiled Linen Storage"                                                                          
## [77] "Spoilage Unsafe Food"                                                                          
## [78] "Tags & Records"                                                                                
## [79] "Test Kit Provided"                                                                             
## [80] "Three Compartment Sink"                                                                        
## [81] "Times as a Public Health Control"                                                              
## [82] "Tobacco"                                                                                       
## [83] "Toilet Enclosed Clean"                                                                         
## [84] "Toxic Items: Original Container"                                                               
## [85] "Unsafe Water"                                                                                  
## [86] "Walls/Ceilings Designed Constructed Installed"                                                 
## [87] "Wash Rinse Water Clean Proper Temperature."                                                    
## [88] "Washing fruits and veg's."                                                                     
## [89] "Wiping Cloths  Clean  Sanitize"
ccvioly %>% group_by(ViolDesc) %>% summarise(n=length(businessName))
## # A tibble: 88 x 2
##    ViolDesc                                    n
##    <fct>                                   <int>
##  1 Adequate Handwashing/Where/When/How      3289
##  2 Adequate Number  Frequency Vermin Proof  5275
##  3 Anti-Choking                             1092
##  4 Approved Food  or Color Additives          64
##  5 Approved Source                          1166
##  6 Clean Cloths  Hair Restraint             4844
##  7 Clean Equipment & Utensils Storage       4545
##  8 Cold Holding                            12781
##  9 Conformance w/ Approved Procedures       1393
## 10 Consumer Advisories                      3526
## # ... with 78 more rows
ccvioly %>% group_by(ViolStatus) %>% summarise(n=length(businessName))
## # A tibble: 2 x 2
##   ViolStatus      n
##   <fct>       <int>
## 1 Fail       283352
## 2 Pass       226663
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ readr   1.1.1
## ✔ tibble  1.4.2     ✔ purrr   0.2.5
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ ggplot2 3.1.0     ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::as.difftime() masks base::as.difftime()
## ✖ lubridate::date()        masks base::date()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ lubridate::intersect()   masks base::intersect()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ lubridate::setdiff()     masks base::setdiff()
## ✖ lubridate::union()       masks base::union()
ccvioly$violcats <- fct_collapse(ccvioly$ViolDesc, 
         "Floors, Walls, and Ceilings" = c("Fixture's not properly shielded","Floors Designed Constructed Installed","Improper Maintenance of Floors","Improper Maintenance of Walls/Ceilings","Walls/Ceilings Designed Constructed Installed"), 
         "Food Equipment and Utensils" = c("Clean Equipment & Utensils Storage", 
"Dishwashng Facilities", "Equipment Thermometers","Food Contact Surfaces Clean","Food Contact Surfaces Design", "Food Thermometers Provided", "Food Utensil Storage" ,"Improper Storage of Re-usable Utensils", "Inadequate Facilities/Cooling Methods" ,"Installed and Maintained", "Non-Food Contact Surfaces" , "Non-Food Contact Surfaces Clean" ,"Pre-Flushed  Scrapped  Soaked", "Re-use of Single Service Articles","Recieving/Sound Condition/Proper Temperatures", "Single Service Articles Stored Dispensed", "Test Kit Provided"
,"Three Compartment Sink", "Wash Rinse Water Clean Proper Temperature.", "Wiping Cloths  Clean  Sanitize"),
"Food Protection" = c("Cold Holding" ,"Hot Holding" ,"Conformance w/ Approved Procedures","Cooking Temperatures","Cooling","Food products protected against contamination(Properly covered/secured for overnight storage).","Food Protection" ,"Food Restrictions & Preparation","Handling of Food & Ice","PHF's Properly Thawed","Reduced Oxygen Packaging" ,"Reheating","Reservice of PHF or Unwrapped Foods","Separation  Segregation Cross Contamination","Washing fruits and veg's."), 
"Toilet and hand-washing facilities" = c("Adequate Handwashing/Where/When/How","Hand Cleaner  Drying  Tissue Signage","Location  Accessible", "Low Sanitizer Alarm", "Number  Convenient" , "Prevention of Contamination from Hands", "Separation/Sanitizer Criteria", "Toilet Enclosed Clean"),
"Food" = c("Approved Food  or Color Additives",
"Approved Source" ,"Food Container Labels" ,"Labeled  Common Name","Labeling of Ingredients" ,"Shellstock ID","Spoilage Unsafe Food","Tags & Records"),
"Plumbing"=c("Cross Connection Back Siphonage  Backflow",
"Sewage and Waste Water"), 
"Lighting"=c("Inadequate Lighting"), 
"Insect, Rodent, Animal Control"=c("Adequate Number  Frequency Vermin Proof","Insects  Rodents  Animals","Parasite Destruction", "Pesticide Usage"), 
"Garbage and Refuse Disposal" = c("Improper Cleaning of Receptacles","Outside Storage Improperly Maintained"), 
"Personnel"= c("Clean Cloths  Hair Restraint","Good Hygienic Practices","Non-Compliance w/Employee Health Policy","Person in charge Assigned" ,"Personnel w/ Infections Restricted/Excluded","PIC Knowledge","PIC Performing Duties", "Safe Food Handeling Instructions","Times as a Public Health Control", "Tobacco"), 
"Water" = c("Hot and Cold Water","Unsafe Water"), 
"Dressing Rooms" =c("Dressing Rooms Clean/Lockers Provided"), 
"Others"= c("Anti-Choking", "Consumer Advisories", "Living/Sleeping Quaters/Laundry","Medicines  FirstAid Storage","Mop Sink not Provided","Premises Maintained","Proper storage of clean linen","Public Nuisance" ,"Rooms and Equipment Vented","Soiled Linen Storage","Toxic Items: Original Container")

)
      
levels(ccvioly$violcats)
##  [1] "Toilet and hand-washing facilities"
##  [2] "Insect, Rodent, Animal Control"    
##  [3] "Others"                            
##  [4] "Food"                              
##  [5] "Personnel"                         
##  [6] "Food Equipment and Utensils"       
##  [7] "Food Protection"                   
##  [8] "Plumbing"                          
##  [9] "Dressing Rooms"                    
## [10] "Floors, Walls, and Ceilings"       
## [11] "Water"                             
## [12] "Garbage and Refuse Disposal"       
## [13] "Lighting"
sumx <-ccvioly %>%  group_by(LICENSENO,year,violcats, ViolStatus) %>% summarise(n=n(), name=paste(unique(businessName)), LICSTATUS=paste(unique(LICSTATUS)), DESCRIPT=paste(unique(DESCRIPT)), ZIP=paste(unique(ZIP))) 


sumviol <- ccvioly %>% filter(., ViolStatus == "Fail") %>% group_by(LICENSENO,year,violcats) %>% 
  summarise(n=n(), name=paste(unique(businessName)), LICSTATUS=paste(unique(LICSTATUS)), DESCRIPT=paste(unique(DESCRIPT)),
            YEAR=paste(unique(year)), violcat=paste(unique(violcats)), ZIP=paste(unique(ZIP))) 

sumpass <- ccvioly %>% filter(., ViolStatus == "Pass") %>% group_by(LICENSENO,year,violcats) %>% 
  summarise(n=n(), name=paste(unique(businessName)), LICSTATUS=paste(unique(LICSTATUS)), DESCRIPT=paste(unique(DESCRIPT)),
            YEAR=paste(unique(year)), violcat=paste(unique(violcats)), ZIP=paste(unique(ZIP))) 

sumx is dataset that groups the data by license number, year of violation, violation categories, and violation status; collapse them based on these, and count the # of rows based on these. Thus it represents for a particular food establishment at year X, what would be the # of failing/passing violations for a particular violation types.

finalviol <- sumx[,c(1,6,8,9,2,7,3,4,5)]
head(finalviol)
## # A tibble: 6 x 9
## # Groups:   LICENSENO, year, violcats [6]
##   LICENSENO name  DESCRIPT ZIP    year LICSTATUS violcats ViolStatus     n
##       <int> <chr> <chr>    <chr> <dbl> <chr>     <fct>    <fct>      <int>
## 1      4032 Stop… Retail … 2131   2007 Inactive  Toilet … Fail           2
## 2      4032 Stop… Retail … 2131   2007 Inactive  Insect,… Fail           2
## 3      4032 Stop… Retail … 2131   2007 Inactive  Food Eq… Fail          10
## 4      4032 Stop… Retail … 2131   2007 Inactive  Food Pr… Fail           2
## 5      4032 Stop… Retail … 2131   2007 Inactive  Floors,… Fail           4
## 6      4032 Stop… Retail … 2131   2007 Inactive  Garbage… Fail           4
write.csv(finalviol, 'finalviol.csv')

We then uploaded this finalviol.csv to github

=========================================================================================================== #Restaurant Violations Analysis based on Years of Inspections The goal of this analysis is to look at the trends of restaurant violations based on the years of inspections performed.

library(tidyverse)
library(dplyr)
library(ggplot2)

Uploading the finalviol dataset

Starting by exploring the contents of the datset.

summary(finalviol)
##        X            LICENSENO                  name       
##  Min.   :     1   Min.   :  4032   Subway        :  1173  
##  1st Qu.: 55772   1st Qu.: 21420   Dunkin Donuts :  1152  
##  Median :111544   Median : 25121   Mcdonalds     :   822  
##  Mean   :111544   Mean   : 48590   Dunkin' Donuts:   816  
##  3rd Qu.:167316   3rd Qu.: 67870   Au Bon Pain   :   676  
##  Max.   :223087   Max.   :379898   Burger King   :   649  
##                                    (Other)       :217799  
##                           DESCRIPT           ZIP            year     
##  Eating & Drinking            :102258   Min.   :2050   Min.   :2006  
##  Eating & Drinking w/ Take Out: 86013   1st Qu.:2115   1st Qu.:2010  
##  Mobile Food Walk On          :  2261   Median :2122   Median :2013  
##  Retail Food                  : 32555   Mean   :2128   Mean   :2013  
##                                         3rd Qu.:2130   3rd Qu.:2016  
##                                         Max.   :2467   Max.   :2018  
##                                                                      
##     LICSTATUS                                    violcats    
##  Active  :148255   Food Equipment and Utensils       :49731  
##  Deleted :    12   Floors, Walls, and Ceilings       :38450  
##  Inactive: 74820   Food Protection                   :30427  
##                    Toilet and hand-washing facilities:25969  
##                    Others                            :20704  
##                    Personnel                         :17433  
##                    (Other)                           :40373  
##  ViolStatus          n         
##  Fail:117866   Min.   : 1.000  
##  Pass:105221   1st Qu.: 1.000  
##                Median : 1.000  
##                Mean   : 2.286  
##                3rd Qu.: 3.000  
##                Max.   :64.000  
## 
table(finalviol$year, finalviol$ViolStatus)
##       
##         Fail  Pass
##   2006     0     2
##   2007  5498  6035
##   2008 10865  9459
##   2009  9928  8588
##   2010  8879  7657
##   2011  8669  7462
##   2012 10730  9585
##   2013  9781  8589
##   2014 10467  9310
##   2015 10426  9349
##   2016 11519 10489
##   2017 10177  8975
##   2018 10927  9721

Since there are only two observations for 2006, I’ve filtered out these observations from the subsequent analyses.

Looking at the trend in violation status of food inspections from 2007 to 2018, we see that there is a large jump in the number of food inspections that were conducted from 2007 to 2008. The number of food inspections declines between 2009 and 2011 and then picks up and remains fairly constant starting in 2012.

finalviol %>% filter (year %in% c(2007,2008, 2009, 2010,2011, 2012, 2013, 2014, 2015, 2016, 2017,2018))%>% group_by(year) %>%
ggplot(aes(ViolStatus, fill=ViolStatus))+ geom_bar() +
  xlab ("Violation Status")+
  ggtitle("Violation Status of Food Inspections from 2007-2018")+ 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))+
  facet_grid(.~year)

Throughout the years, around 53% of the food inspections have resulted in a Failed inspection.

finalviol %>% filter (year !=2006) %>%
  group_by(year) %>%
  summarize(number_fail = sum(ViolStatus=="Fail"),number_pass =sum(ViolStatus=="Pass"), 
            total =  n(), percent_fail=number_fail/total)
## # A tibble: 12 x 5
##     year number_fail number_pass total percent_fail
##    <int>       <int>       <int> <int>        <dbl>
##  1  2007        5498        6035 11533        0.477
##  2  2008       10865        9459 20324        0.535
##  3  2009        9928        8588 18516        0.536
##  4  2010        8879        7657 16536        0.537
##  5  2011        8669        7462 16131        0.537
##  6  2012       10730        9585 20315        0.528
##  7  2013        9781        8589 18370        0.532
##  8  2014       10467        9310 19777        0.529
##  9  2015       10426        9349 19775        0.527
## 10  2016       11519       10489 22008        0.523
## 11  2017       10177        8975 19152        0.531
## 12  2018       10927        9721 20648        0.529
finalviol_year <- finalviol %>% filter (year !=2006) %>%
  group_by(year) %>%
  summarize(number_fail = sum(ViolStatus=="Fail"),number_pass =sum(ViolStatus=="Pass"), 
            total =  n(), percent_fail=(number_fail/total*100))
finalviol_year %>% summarize(meanfail = mean(percent_fail))
## # A tibble: 1 x 1
##   meanfail
##      <dbl>
## 1     52.7
finalviol_year %>% ggplot() +
  
  geom_line(aes (x=year, y=percent_fail), size = 1, color = "red")+ 
  
  ylim(45,55)+ 
  
  geom_point(aes (x=year, y=percent_fail), size = 2, color = "black")+
  
  xlab("Year")+
  
  ylab("% of Failed Inspection Test")+
  
  ggtitle("Percentage of Failed Inspection Test from 2007-2018")

Next,looking at the distribution of number of violations of those that Failed the Inspection Test, we see that across the years, the category with the largest count of violations is for number of violations <= 5.

finalviol %>% filter (ViolStatus=="Fail", year %in% c(2007,2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,2018))%>% 
  
  ggplot(aes(n)) +
  
  geom_histogram (binwidth = 3, color = "black")+
  
  xlim(0, 20)+ 
  
  facet_wrap(.~year, ncol=6)+ xlab("Number of Violations")+
  
  ggtitle("Distribution of no. violations of establishments that Failed Inspection Test")
## Warning: Removed 265 rows containing non-finite values (stat_bin).
## Warning: Removed 24 rows containing missing values (geom_bar).

However, we see from the boxplot below that there is a wide spread in the number of violations, with some inspections detecting over 40 counts of violations!

finalviol %>% filter (year >2007 & ViolStatus=="Fail" ) %>%
  
  ggplot(aes(x=year, y = n, group=year))+
  
  geom_boxplot()+ 
  
  ylab("Number of Violations")+
  
  ggtitle("Boxplot of No. of Violations of Establishments that Failed Inspection Test") 

Adding the total number of violations for each ViolStatus category, grouped by year:

finalviol1 <- finalviol %>% filter (year !=2006 & ViolStatus=="Fail") %>%
  
  group_by(year) %>%
  
  summarize(total_no_violations_fail = sum(n))

finalviol2 <-finalviol %>% filter (year !=2006 & ViolStatus=="Pass") %>%
  
  group_by(year) %>%
  
  summarize(total_no_violations_pass = sum(n))

finalviol_number<- left_join(finalviol1,finalviol2)
## Joining, by = "year"
finalviol_number<- finalviol_number %>%
  mutate(total_no_violations_diff = abs((total_no_violations_fail) - (total_no_violations_pass)))

In this graph below, the red line represents the trend in the total number of violations for those that failed Inspection. Similarly, the blue line represents the trend for those that Passed the Inspection. Trend lines for the two groups are very similar, with the Failed Inspections reporting between 4,000-6,000 more number of violations.

finalviol_number %>%
  
  ggplot()+
  
  geom_line(aes (x=year, y = total_no_violations_fail), size =0.5, color = "red")+
  
  geom_point(aes (x=year, y= total_no_violations_fail), size = 2, color = "black")+
  
  geom_line(mapping=aes(x=year, y=total_no_violations_pass), size=0.5, color="blue")+
  
  geom_point(aes (x=year, y= total_no_violations_pass), size = 2, color = "black")+
  
  xlab("Year") + 
  
  ylab(" Total No. of Violations ") +
  
  ggtitle("Total Number of Violations from 2007-2018") + 
  
  geom_text(aes(x= 2013, y = 15000, label="Pass"), color="blue")+
  
  geom_text(aes(x= 2010, y = 25000, label="Fail"), color="red")

This graph below shows the difference in the number of violations based on violation status.

finalviol_number %>%
  ggplot()+
  geom_line(aes (x=year, y = total_no_violations_diff), size =0.5, color = "purple")+
  geom_point(aes (x=year, y= total_no_violations_diff), size = 2, color = "black")+
  xlab("Year") + 
  ylab(" Difference in No. of Violations ") +
  ggtitle("Difference in Number of Passed vs Failed Inspection Annually") 

============================================================================================================= #Analysis based on restaurant type

The goal of this analysis is to explore differences in restaurants inspections violations based on the restaurant and restaurant type.

library(tidyverse)
library(gridExtra)
finalviol <- read.csv("https://raw.githubusercontent.com/plimasalle/Food_Project/master/finalviol.csv")
finalviol <- finalviol %>%
  filter(year > 2006, LICSTATUS == "Active") %>%
  mutate(violstatus_fail = ifelse(ViolStatus == "Fail", 1, 0))

finalviol_anyfail <- finalviol %>%
  group_by(DESCRIPT, name, ZIP, LICENSENO, year) %>%
  summarize(fails = sum(violstatus_fail), num_cat = n()) %>%
  mutate(anyfails = ifelse(fails > 0, 1, 0), fivefails = ifelse(fails > 4, 1, 0), pcnt_fails = fails/num_cat)

finalviol_anyfail %>% group_by(DESCRIPT) %>%
  summarize(mean(anyfails), n())
## # A tibble: 4 x 3
##   DESCRIPT                      `mean(anyfails)` `n()`
##   <fct>                                    <dbl> <int>
## 1 Eating & Drinking                        0.988  9988
## 2 Eating & Drinking w/ Take Out            0.984  7959
## 3 Mobile Food Walk On                      0.997   308
## 4 Retail Food                              0.981  3696

There are four main categories of restaurants in the dataset: 1. Eating & Drinking 2. Eating & Drinking w/ Take Out 3. Mobile Food Walk On 4. Retail Food

Across all years, the vast majority of restaurants had at least one failure across the violation categories inspected (ranging from 98.1% among Retail Food to 99.7% among Mobile Food Walk On restaurants).

Note: These analyses are limited to restaurants with active licenses only, but these numbers did not change substantially (fractions of percentage points) when restaurants with inactive licenses were also included.

finalviol_anyfail %>% group_by(DESCRIPT) %>%
  summarize(mean(fivefails), n())
## # A tibble: 4 x 3
##   DESCRIPT                      `mean(fivefails)` `n()`
##   <fct>                                     <dbl> <int>
## 1 Eating & Drinking                         0.338  9988
## 2 Eating & Drinking w/ Take Out             0.313  7959
## 3 Mobile Food Walk On                       0.166   308
## 4 Retail Food                               0.227  3696

Across all years, there was more variation across restaurant categories in the percent of restaurants per category that had at least five failures across violation categories. About a third of Eating and Drinking and Eating & Drinking w/ Take Out restaurants, about a fifth of Retail Food restaurants, and about a sixth of Mobile Food Walk On restuarants had at least five failures.

finalviol_anyfail %>% filter(anyfails > 0) %>%
  group_by(DESCRIPT) %>%
  summarize(mean(pcnt_fails), n())
## # A tibble: 4 x 3
##   DESCRIPT                      `mean(pcnt_fails)` `n()`
##   <fct>                                      <dbl> <int>
## 1 Eating & Drinking                          0.590  9870
## 2 Eating & Drinking w/ Take Out              0.601  7832
## 3 Mobile Food Walk On                        0.597   307
## 4 Retail Food                                0.680  3627

Among restaurants that failed at least one inspection category, Retail Food restaurants overall failed the highest percent of violation categories inspected with the average Retail Food restaurant failing 67.9% of the violation categories inspected. Eating & Drinking restaurants failed the lowest percent of categories inspected with the average Eating & Drinking restaurant failing 58.9% of violation categories inspected.

finalviol_typeyear <- finalviol_anyfail %>%
  filter(anyfails > 0) %>%
  group_by(DESCRIPT, year) %>%
  summarize(avg_pcnt_fails = mean(pcnt_fails), count = n())

ggplot(finalviol_typeyear, aes(year, avg_pcnt_fails, color = DESCRIPT)) +
  geom_line() +
  scale_x_continuous(name = "Year") +
  scale_y_continuous(name = "Percent of Violation Categories Failed", limits = c(0, 0.8))

With the exception of Mobile Food Walk On, all restuarants that failed at least one violation category inspected tended to fail a similar percent of violation categories inspected across years. This variation for Mobile Food Walk On may be driven by the low number of Mobile Food Walk On restaurants (ranging from 1 in 2009 to a peak of 102 in 2018).

finalviol_cat <- finalviol %>%
  group_by(DESCRIPT, year, violcats) %>%
  summarize(avg_cat = mean(violstatus_fail), count = n())

ggplot(filter(finalviol_cat, DESCRIPT == "Eating & Drinking"), aes(year, avg_cat, color = violcats)) +
  geom_line() +
  ggtitle("Frequency of Violation Categories for Eating & Drinking Restuarants") +
  scale_x_continuous(name = "Year") +
  scale_y_continuous(name = "Percent of Inspections Failed by Violation Category", limits = c(0, 0.6))

ggplot(filter(finalviol_cat, DESCRIPT == "Eating & Drinking w/ Take Out"), aes(year, avg_cat, color = violcats)) +
  geom_line() +
  ggtitle("Frequency of Violation Categories for Eating & Drinking w/ Take Out Restuarants") +
  scale_x_continuous(name = "Year") +
  scale_y_continuous(name = "Percent of Inspections Failed by Violation Category", limits = c(0, 0.6))

ggplot(filter(finalviol_cat, DESCRIPT == "Mobile Food Walk On"), aes(year, avg_cat, color = violcats)) +
  geom_line() +
  ggtitle("Frequency of Violation Categories for Mobile Food Walk On Restuarants") +
  scale_x_continuous(name = "Year") +
  scale_y_continuous(name = "Percent of Inspections Failed by Violation Category", limits = c(0, 1.0))

ggplot(filter(finalviol_cat, DESCRIPT == "Retail Food"), aes(year, avg_cat, color = violcats)) +
  geom_line() +
  ggtitle("Frequency of Violation Categories for Retail Food Restuarants") +
  scale_x_continuous(name = "Year") +
  scale_y_continuous(name = "Percent of Inspections Failed by Violation Category", limits = c(0, 0.7))

Across all restaurant types, the frequency of failing by violation category is approximately the same across violation categories across all years (generally within about a five percentage point band).

finalviol_violcats <- finalviol %>%
  group_by(violcats, name, ZIP, LICENSENO, year) %>%
  summarize(fails = sum(violstatus_fail), num_cat = n()) %>%
  mutate(anyfails = ifelse(fails > 0, 1, 0), pcnt_fails = fails/num_cat)

finalviol_violcats %>% group_by(violcats) %>%
  summarize(mean(anyfails), n())
## # A tibble: 13 x 3
##    violcats                           `mean(anyfails)` `n()`
##    <fct>                                         <dbl> <int>
##  1 Dressing Rooms                                0.951   324
##  2 Floors, Walls, and Ceilings                   0.982 14463
##  3 Food                                          0.965  4377
##  4 Food Equipment and Utensils                   0.983 18507
##  5 Food Protection                               0.976 10897
##  6 Garbage and Refuse Disposal                   0.961  2032
##  7 Insect, Rodent, Animal Control                0.968  4824
##  8 Lighting                                      0.973  1431
##  9 Others                                        0.974  7397
## 10 Personnel                                     0.974  6085
## 11 Plumbing                                      0.960   348
## 12 Toilet and hand-washing facilities            0.970  8961
## 13 Water                                         0.942   658

Across all violation categories, there are high rates of reatuarants failing at least one inspection

finalviol_catyear <- finalviol_violcats %>%
  filter(anyfails > 0) %>%
  group_by(violcats, year) %>%
  summarize(avg_pcnt_fails = mean(pcnt_fails), count = n())

ggplot(finalviol_catyear, aes(year, avg_pcnt_fails, color = violcats)) +
  geom_line() +
  scale_x_continuous(name = "Year") +
  scale_y_continuous(name = "Percent of Inspections Failed", limits = c(0, 0.7))

Among restaurants that failed in at least one category, there were approximately

finalviol_repeat <- finalviol %>%
  group_by(name, DESCRIPT, ZIP, violcats) %>%
  summarize(mult_fails = sum(violstatus_fail)) %>%
  mutate(violrepeat = ifelse(mult_fails > 1, 1, 0))

Based on the data, the ten most common repeat offenders within the same violation category are: 1. Subway (02115) 2. Cosi (02110) 3. Dunkin Donuts (02125) 4. Dunkin Donuts (02128) 5. Dunkin Donuts (02122) 6. McDonalds (02135) 7. U Food Grill (02128) 8. Burger King (02128) 9. Dunkin Donuts RMG (02128) 10. Currito Burrito (02128)

Below attached an exploratory analysis on potentially different types of violations amongst Dunkin Donuts franchises. There does not seem to be a clear pattern here.

============================================================================================================ #Analysis based on geographic area Our goal in this case is to analyze the trend of health inspection violations based on the geographical area. From the heatmap below, we could see that most of the inspections happen in north part of Boston, around the airport area.

1. Upload Data

finalviol <- read.csv("https://raw.githubusercontent.com/plimasalle/Food_Project/master/finalviol.csv", header=T, na.strings=c(""," ","NA"))

2. Clean Data

For simplicity’s sake, I am looking at which zipcodes had the most inspections overall.

3. Gathered Geocodes

Please download the zipcode package in R.

library(zipcode)
data(zipcode)
zipcode <- zipcode %>% filter(state == "MA")

4. Generate a Map of Boston

Please download the ggmap package in R. Note: The output will be a url. To access the map, copy and paste the url but replace the xxx at the end of the url with the access code listed below.

if(!requireNamespace("devtools")) install.packages("devtools")
## Loading required namespace: devtools
devtools::install_github("dkahle/ggmap", ref = "tidyup")
## Skipping install of 'ggmap' from a github remote, the SHA1 (4dfe5165) has not changed since last install.
##   Use `force = TRUE` to force installation
library(ggmap)
## Google Maps API Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it: see citation("ggmap") for details.
register_google(key = "AIzaSyBJ7I6jURgmb0noA4zA4w6SZHsmjTneEoE", account_type = "premium", day_limit = 100000)
Bos_map <- ggmap(get_map(location = c(lon = -71.0589, lat = 42.3601), zoom=12, source = "google", col="bw"))
## Source : https://maps.googleapis.com/maps/api/staticmap?center=42.3601,-71.0589&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx

5. Create Shapefile of Boston

Please download the rgdal package in R. install.packages(“rgdal”, source= “https://trac.osgeo.org/gdal/wiki/DownloadSource”)

library(rgdal)
## Loading required package: sp
## rgdal: version: 1.3-6, (SVN revision 773)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
##  Linking to sp version: 1.3-1
boston_zips <- readOGR("~/Desktop/ZIP_Codes", "ZIP_Codes")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/csiego1/Desktop/ZIP_Codes", layer: "ZIP_Codes"
## with 43 features
## It has 4 fields
## Integer64 fields read as strings:  OBJECTID
shape_to_ggplot <- function(shape){
  require(broom)
  gg_data <- tidy(shape)
  data <- slot(shape, "data")
  shape[["polyID"]] <- sapply(slot(shape, "polygons"), function(x) slot(x, "ID"))
  gg_data <- merge(gg_data, shape, by.x="id", by.y="polyID")
  return(gg_data)
}

boston_zips_ggplot <- shape_to_ggplot(boston_zips)
## Loading required package: broom
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Regions defined for each Polygons
colnames(boston_zips_ggplot)[2] <- "lon"

ggplot() + geom_polygon(data=boston_zips_ggplot, aes(x=lon, y=lat, group=group), fill="white", color="black")

6. Final Output

(a) Join map and shapefile dataframes by zipcode (b) Overlay the two maps together (c) Create heat map visualization (d) Add labels (NOTE: I wasn’t able to align the two maps perfectly, but I think the final output still works for this project)

geotrends$ZIP_2 <- paste0("0", as.character(geotrends$ZIP))

boston_zips_ggplot_2 <- left_join(boston_zips_ggplot, geotrends, by=c("ZIP5"="ZIP_2"))
## Warning: Column `ZIP5`/`ZIP_2` joining factor and character vector,
## coercing into character vector
polygon_map <- ggplot() + geom_polygon(data=boston_zips_ggplot_2, aes(x=lat, y=lon, group=group, fill=number_inspection), color="black", alpha=0.7) + geom_point(data=geotrends, aes(x=Lat, y=Long))

FINAL <- Bos_map + geom_polygon(data=boston_zips_ggplot_2, aes(y=lat, x=lon, group=group, fill=number_inspection), color="black", alpha=0.7) + scale_fill_gradient2("Number of Times Inspected (2006 - 2018)", low = "blue", mid = "white", high = "red", midpoint = 0) + ggtitle("Total Food Inspections in Boston Eateries (by zip code)") + xlim(-71.15, -70.95) + ylim(42.28, 42.44)
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
FINAL
## Warning: Removed 1 rows containing missing values (geom_rect).